perm filename REWRIT.DOC[L70,TES] blob sn#025984 filedate 1973-02-20 generic text, type T, neo UTF8












                 THE LISP70 PATTERN MATCHING SYSTEM

                         Lawrence G. Tesler
                           Horace J. Enea
                           David C. Smith

                         Stanford University

                           February, 1973







     ABSTRACT

     LISP70 is a descendant of LISP which emphasizes pattern-directed
computation and extensibility.  A function can be defined by a set of
pattern rewrite rules as well as by the normal LAMBDA method.  New
rewrite rules can be added to a previously defined function; thus
LISP70 functions are said to be "extensible".  It is possible to have
the new rules merged into the function automatically so that special
cases are checked before general cases.  Some of the facilities of
the rewrite system are described and a variety of applications are
demonstrated.

     NOTICE

     This is a limited circulation draft of a paper to be submitted
to a conference or journal.  No right to publicize its contents is
granted.
1                                             Tesler, Enea, and Smith


     BACKGROUND




     During the past decade, LISP [16] has been a principal

programming language for artificial intelligence and other frontier

applications of computers.  Like other widely used languages, it has

spawned many variants, each attempting to make one or more

improvements.  Among the aspects that have received particular

attention are notation [1,10,14,21], control

structure [4,11,17,19], data base management [12,17,22], interactive

editing and debugging [24], and execution efficiency.


     A need for a successor to LISP has been recognized [3], and

several efforts in this direction are under way.  The approach being

taken with TENEX-LISP is to begin with an excellent debugging

system [23] and to add on flexible control structure [2] and Algol-

like notation.  The approach taken by LISP70 and by the LISP-related

ECL [26] is to begin with an extensible kernel language which users

can tailor and tune to their own needs.


     "Tailoring" a language means defining facilities which assist in

the solution of particular kinds of problems which may have been

unanticipated by the designers of the kernel.  "Tuning" a language

means specifying more efficient implementations for statements which

are executed frequently in particular programs.
2                                             Tesler, Enea, and Smith


     A language that can be used on only one computer is not of

universal utility; the ability to transfer programs between computers

increases its value.  However, a language that is extensible both

upward and downward is difficult to transport if downward extensions

mention machine-dependent features [7,8].  LISP70 generates code

for an "ideal LISP machine" called "ML" and only the translation from

ML to object machine language is machine-dependent.  Thus, downward

extensions can be factored into a machine-independent and a machine-

dependent part, and during program transfer, the machine-dependent

recoding (if any) is clearly isolated.


     Among the improvements LISP70 makes to LISP are backtrack

control structure [19], streaming [15], pattern-directed computation,

and extensible functions.


     The subjects to be covered in the present paper are pattern-

directed computation and extensible functions.
3                                             Tesler, Enea, and Smith


     PATTERN-DIRECTED COMPUTATION




     Many of the data tranformations performed in LISP applications

are more easily described by pattern matching rules than by

algorithms [12,17,22,25].  In addition, pattern matching rules are

appropriate for the description of input-output conversion, parsing,

and compiling [20].  LISP70 places great emphasis on "pattern rewrite

rules" [5,6,13,27] as an alternative and adjunct to algorithms as a

means of defining functions.


     A brief explanation of rewrite rule syntax and semantics will be

presented with some examples to demonstrate the clarity of the

notation.


     Each rule is of the form DEC→REC.  The DEC (decomposer) is

matched against the "input stream".  If it matches, then the REC

(recomposer) generates the "output stream".


     A literal in a pattern is represented by itself if it is an

identifier or number, or preceded by a quote (') if it is a special

character.

          RULES OF SQUARE =
                  2 → 4,
                  5 → 25,
                  12 → 144 ;
4                                             Tesler, Enea, and Smith


     A private variable of the rule is represented by an identifier

prefixed by a colon (:); it may be bound to only one value during

operation of the rule.

          RULES OF EQUAL =
                  :X :X → T,
                  :X :Y → NIL ;


     A list is represented by a pair of parentheses surrounding the

representations of its elements.  A segment of zero or more elements

is represented by an ellipsis symbol (...).

          RULES OF CAR =
                  (:X ...) → :X ;

          RULES OF CDR =
                  (:X ...) → (...) ;

          RULES OF CONS =
                  :X (...) → (:X ...) ;

          RULES OF ATOM =
                  (:X ...) → NIL,
                  :X → T ;

          RULES OF APPEND =
                  (...) (...) → (... ...) ;


     If a segment needs a name, it is represented by an identifier

prefixed by a double-colon (::).

          RULES OF ASSOC =
                  :X (... (:X ::Y) ...) → (:X ::Y),
                  :X (...) → NIL ;


     A function F can be called in a pattern, passing it a single

argument ARG, by the construct: ARG@F (there are also ways of passing

several arguments to a function).
5                                             Tesler, Enea, and Smith


          RULES OF LENGTH =
                  ( )       →  0,
                  (:X ...)  →  (...) @LENGTH @ADD1 ;
6                                             Tesler, Enea, and Smith


     LIST STRUCTURE TRANSFORMATIONS




     The following set of rules defines a function MOVE_BLOCK of

three arguments: a block to be moved, a location to which it should

be moved, and a representation of the current world.  The function

moves block :B from its current location in the world to location

:TO, and the transformed representation of the world is returned.

          RULES OF MOVE_BLOCK =

                  :B :TO (... (:TO ... :B ...) ...)
                       → (... (:TO ... :B ...) ...),

                  :B :TO (... (... :B ...) ... (:TO ...   ) ...)
                       → (... (...    ...) ... (:TO ... :B) ...),

                  :B :TO (... (:TO ...   ) ... (... :B ...) ...)
                       → (... (:TO ... :B) ... (...    ...) ...),

                  :B :TO (... (... :B ...) ...)
                       → (... (...    ...) ... (:TO :B)),

                  :B :TO (...)
                       → (BLOCK :B NOT IN (...)) @ERROR ;

In the first case, the block is already where it belongs, so the

world does not change; in the second, the block is moved to the

right; in the third, to the left; in the fourth, the location :TO

does not exist yet and is created; in the last case, :B is not in the

world and the ERROR routine is called.


     Functions such as MOVE_BLOCK have been used in a simple planning

program written by one of the authors.  Imagine writing MOVE_BLOCK as
7                                             Tesler, Enea, and Smith


an algorithm; it would require the use of auxiliary functions or of a

PROG with state variables and loops.  Bugs would be more likely in

the algorithm because its operation would not be so transparent.
8                                             Tesler, Enea, and Smith


     REPLACEMENT




     If F is any function, then the construct <F> occurring in a DEC

pattern signifies "replacement".  This means that F is invoked to

translate a substream of the input stream, and that substream is

replaced by its translation.  The altered input stream can then

continue to be matched by the pattern to the right of <F>.


     The following example is from the MLISP compiler, which calls

itself recursively to translate the condition and arms of an IF-

statement to LISP:

          RULES OF MLISP =

                  IF <MLISP>:X THEN <MLISP>:Y ELSE <MLISP>:Z
                          → (COND (:X :Y) (T :Z)),

                  IF <MLISP>:X THEN <MLISP>:Y
                          → (COND (:X :Y) (T NIL)),

                  IF <MLISP>:X
                          → (MISSING THEN) @ERROR,

                  IF      → (ILLEGAL EXPRESSION AFTER IF) @ERROR ;


     Here is another example.  The predicate PALINDROME is true iff

the input stream is a mirror image of itself, i.e., if the left and

right ends are equal and the middle is itself a palindrome.
9                                             Tesler, Enea, and Smith


          RULES OF PALINDROME =   :X      →       T,

                                :X :X     →       T,

                  :X <PALINDROME>T :X     →       T,

                                  ...     →       NIL ;


     The replacement facility provides both the non-terminal symbols

of syntactic parsing and the "actors" of PLANNER [12].
10                                            Tesler, Enea, and Smith


     EXTENSIBLE FUNCTIONS




     New rules may be added to an existing set of rewrite rules under

program control; thus, any compiler table or any other system of

rewrite rules can be extended by the user.  For this reason, a set of

rewrite rules is said to be an "extensible function".  The "ALSO"

clause is used to add cases to an extensible function:

          RULES OF MLISP ALSO =

                  IF <MLISP>:X THEN <MLISP>:Y ELSE
                          → (MISSING EXPRESSION AFTER ELSE) @ERROR,

                  IF <MLISP>:X THEN
                          → (MISSING EXPRESSION AFTER THEN) @ERROR ;


     Extensions can be made effective throughout the program or only

in the current block, as the user wishes.


     A regular LAMBDA function can also be extended.  Its bound

variables are considered analogous to a DEC and its body analogous to

a REC.  Accordingly, the compiler converts it to an equivalent

rewrite function of one rule before extending it.
11                                            Tesler, Enea, and Smith


     THE EXTENSIBLE COMPILER




     To make an extensible compiler practical, the casual user must

be able to understand how it works in order to change it.  To

demonstrate that it is not inordinately difficult to understand the

LISP70 compiler, those rules which get involved in translating a

particular statement from MLISP to LAP/PDP-10 are shown below.  A

simplified LISP70 (typeless and unhierarchical) is used in the

examples, but the real thing is not much more complicated.  The

statement to be translated is:

                  IF A < B THEN C ELSE D

The rules invoked in the MLISP-to-LISP translator are:

          RULES OF MLISP =

                  IF <MLISP>:X THEN <MLISP>:Y ELSE <MLISP>:Z
                                  → (COND (:X :Y) (T :Z)),

                  :X  '<  :Y      →       (LESSP :X :Y),

                  :VAR            →       :VAR ;


     The LISP-to-ML compiler below utilizes the following feature: if

a colon variable occurs in the REC but it did not occur in the DEC,

an "existential value" (which is something like a generated symbol)

is bound to it.  Here, the existential value is used as a compiler-

generated label.
12                                            Tesler, Enea, and Smith


          RULES OF COMPILER =

                  (COND (T :E))
                        → :E @COMPILER,

                  (COND (:B :E) ...)
                        → :B @COMPILER
                          (BRANCH_FALSE :ELSE)
                          :E @COMPILER
                          (BRANCH :OUT)
                          (LABEL :ELSE)
                          (COND ...) @COMPILER
                          (LABEL :OUT),

                  (LESSP :A :B)
                        → :A @COMPILER
                          (PUSH_DOWN)
                          :B @COMPILER
                          (COMPARE LESS),

                  :V    → (LOAD :V) ;


     The ML-to-LAP translator below assumes that the value register

of the ideal machine is represented on the PDP-10 by a register named

"VAL", that there is a single stack based on register "P", and that

variables can be accessed from fixed locations in memory.
13                                            Tesler, Enea, and Smith


          RULES OF ML =

                  (BRANCH_FALSE :LBL)
                        → (JUMPE VAL :LBL),

                  (BRANCH :LBL)
                        → (JRST :LBL),

                  (LABEL :LBL)
                        → :LBL,

                  (PUSH_DOWN)
                        → (PUSH P VAL),
           
                  (COMPARE LESS)
                        → (CAMGE VAL 0 P)
                          (TDZA VAL VAL)
                          (MOVEI VAL 1)
                          (POP P),
           
                  (LOAD :V)
                        → (MOVE VAL :V) ;


     The code generated is thus:

                  (MOVE VAL A)
                  (PUSH P VAL)
                  (MOVE VAL B)
                  (CAMGE VAL 0 P)
                  (TDZA VAL VAL)
                  (MOVEI VAL 1)
                  (POP P)
                  (JUMPE VAL E0001)
                  (MOVE VAL C)
                  (JRST E0002)
          E0001   (MOVE VAL D)
          E0002

Peephole optimization guided by another rewrite function can reduce

this to six instructions.
14                                            Tesler, Enea, and Smith


     AUTOMATIC ORDERING OF REWRITE RULES




     In most pattern matchers, candidate patterns to match an input

stream are tried either in order of appearance on a list or in an

essentially random order not obvious to the programmer.  LISP70 tries

matches in an order specified by an "ordering function" associated

with each set of rewrite rules.


     One common ordering is "BY APPEARANCE", which is appropriate

when the programmer wants conscious control of the ordering.  Another

is "BY SPECIFICITY", which is useful in left-to-right parsers and

other applications where the compiler can be trusted to order the

rules so that more specific cases are tried before more general ones.

When neither of these standard functions is appropriate, the

programmer can define and use specialized ordering functions, or can

extend SPECIFICITY to meet the special requirements.


     Automatic ordering is a great convenience for a user who is

extending a compiler, a natural language parser, or an inference

system.  It can eliminate the need to study the existing rules simply

to determine where to position a new rule.  Ordering functions can

also be designed to detect inconsistencies and ambiguities and to

discover opportunities for generalization of similar rules.
15                                            Tesler, Enea, and Smith


     As an example, take the LISP-TO-ML translator "COMPILER", which

includes the following rule for the intrinsic function PLUS (slightly

simplified for presentation):

          RULES OF COMPILER =

                  (PLUS :X :Y)
                        → :X@COMPILER
                          (PUSH_DOWN)
                          :Y@COMPILER
                          (ARITHMETIC ADD) ;


     To add special cases to the compiler for sums including the

constant zero, the user could include the following declaration in a

program:

          RULES OF COMPILER ALSO =

                  (PLUS :X 0) → :X@COMPILER,

                  (PLUS 0 :X) → :X@COMPILER ;


     The compiler is ordered by SPECIFICITY, which knows that the

literal 0 is more specific than the variable :X or :Y.  Therefore,

both of the new rules would be ordered before the original PLUS rule.

Suppose the added rules were placed after the general rule; then the

original rule would get first crack at every input stream, and sums

with zero would not be processed as special cases.
16                                            Tesler, Enea, and Smith


     AN ORDERING FUNCTION




     The complete definition of the ordering function SPECIFICITY is

beyond the scope of this paper.  It works roughly as follows.

Comparing DEC patterns by a left-to-right scan, it considers literals

more specific than variables, a colon variable at its second

occurrence more specific than one at its first occurrence, and a

function call with an "@" more specific than a variable but less

specific than a literal.  The specificity of a replacement <F> is

that of the most general rule in the function F.


     A DEC with an ellipsis is considered to expand to multiple rules

in which the ellipsis is replaced by 0, 1, 2, 3, ... ∞ consecutive

variables.  The specificity of each expanded rule is considered

separately.  Observe that between two expansions of an elliptic rule

some other rewrite rule of intermediate specificity may lie.

Example:

          RULES OF SILLY =
                  A ... B ... C   →       1,
                  A B :X :Y       →       2 ;

Two of the expansions of the first rule are:

                  A B :X C        →       1,
                  A :Z B C        →       1,

and the second rule of SILLY comes between these in specificity.
17                                            Tesler, Enea, and Smith


     SPECIFICITY is itself defined by a system of rewrite rules.  To

give a flavor of how this is done, a very simplified SPECIFICITY will

be defined.  It takes two arguments (DEC patterns translated to LISP

notation) and returns them in the proper order.

          RULES OF SPECIFICITY =

                  (COLON :V) (LITERAL :L)
                          → (ORDER (LITERAL :L) (COLON :V)),

                  (LITERAL :L) (COLON :V)
                          → (ORDER (LITERAL :L) (COLON :V)) ;
18                                            Tesler, Enea, and Smith


     OTHER FACILITIES AND APPLICATIONS




     Other facilities of the rewrite system include side-conditions,

conjunctive match, disjunctive match, non-match, repetition,

evaluation of LISP and MLISP expressions, look-ahead, look-behind,

and reversible rules.


     Out of rewrite functions, it is easy to define systems of

inference rules, assertions, and beliefs.  LISP70 has facilities for

retrieving either all or the first of the assertions in a set of

rules that match a given pattern.


     Rewrite rules are a great help in natural language

understanding, whether the methods used are based on phrase structure

grammar, features, keywords, or word patterns.  A use of LISP70 with

the latter method is described in a companion paper [9].
19                                            Tesler, Enea, and Smith


     CONCLUSIONS




     Many of the design decisions of LISP70 are contrary to trends

seen in other "successors to LISP".  The goals of these languages are

similar, but their means are often quite diverse.


     Concern with good notation does not have to compromise the

development of powerful facilities; indeed, good notation can make

those facilities more convenient to use.


     Emphasis on pattern-directed computation does not overly

constrain the programmer accustomed to writing algorithms.  Rewrites

and algorithms can be mixed, and the most appropriate means of

defining a given function can be selected.


     LISP70 does not limit the use of pattern rewrite rules to a few

facilities like goal-achievement and assertion-retrieval.  A set of

rules can be applied to arguments like any other function, and can

stream data from any type of structure or process to any other.


     Automatic ordering does not prevent the programmer from seizing

control, but allows him to relinquish control to a procedure of his

choosing to save him tedious study of an existing program when making

extensions.
20                                            Tesler, Enea, and Smith


     The LISP70 kernel is being debugged on a PDP-10 at the time of

this writing (February, 1973).  The language has already been used

successfully in programs for question-answering and planning.  After

the kernel can reliably compile itself, extensions are planned to

improve its control structure, editing, and debugging capabilities,

and versions may be bootstrapped to other computers.
21                                            Tesler, Enea, and Smith


                             REFERENCES
                             __________





[1] Abrahams, P. W. et al, "The LISP 2 Programming Language and
     System", Proc. AFIPS FJCC 29 (1966), 661-676
              _____ _____ ____ __

[2] Bobrow, D. G. and Wegbreit, B., "A Model and Stack Implementation
     of Multiple Environments", Report No. 2334 (March 1972), Bolt,
     Beranek, and Newman

[3] Bobrow, D. G., "Requirements for Advanced Programming Systems for
     List Processing", Comm. ACM 15, 7 (July 1972), 618-627
                       _____ ___ __

[4] Burstall, R.M., Collins, J.S. and Popplestone, R.J., Programming
                                                         ___________
     in Pop-2, University Press, Edinburg, Scotland (1971), 279-282
     __ _____

[5] Colby, K. M. and Enea, H., "Heuristic Methods for Computer
     Understanding of Natural Language in Context Restricted On-Line
     Dialogues", Math. Biosciences 1 (1967), 1-25
                 _____ ___________ _

[6] Colby, K. M., Watt, J., and Gilbert, J. P., "A Computer Method of
     Psychotherapy", J. of Nervous and Mental Disease 142 (1966),
                     __ __ _______ ___ ______ _______ ___
     148-152

[7] Dickman, B. N., "ETC: An Extensible Macro Based Compiler",
     Proc. AFIPS SJCC 38 (1971), 529-538
     _____ _____ ____ __

[8] Duby, J. J., "Extensible Languages: A Potential User's Point of
     View", in [18], pp.137-140

[9] Enea, H., Colby, K. M., and Moravec, H., "Idiolectic Language-
     Analysis for Understanding Doctor-Patient Dialogues", (submitted
     to IJCAI)

[10] Enea, H., "MLISP (IBM 360/67)", Computer Science Technical
     Report CS 92 (1968), Stanford University

[11] Hewitt, C., PLANNER: A Language for Manipulating Models and
                 ________ _ ________ ___ ____________ ______ ___
     Proving Theorems in a Robot, Ph.D. Thesis (Feb 1971), MIT
     _______ ________ __ _ _____

[12] Hewitt, C., "Procedural Embedding of Knowledge in PLANNER",
     Proc. IJCAI 2 (1971), 167-182
     _____ _____ _

[13] Kay, A., "FLEX, A Flexible Extendible Language", CS Tech. Report
     (1968), U. of Utah
22                                            Tesler, Enea, and Smith


[14] Landin, P. J., "The Next 700 Programming Languages", CACM 9, 3
                                                          ____ _
     (March 1966), 157-166

[15] Leavenworth, B. M., "Definition of Quasi-Parallel Control
     Functions in a High-Level Language", Proc. Int'l. Comp. Symp.
                                          _____ ______ _____ _____
     (Bonn, 1970)

[16] McCarthy, J., "Recursive Functions of Symbolic Expressions and
     their Computation by Machine, Part I", Comm. ACM 3, 4
                                            _____ ___ _
     (April 1960), 184-195

[17] Rulifson, J. F., Waldinger, R. J., and Derksen, J. A., QA4, A
                                                            ____ _
     Language for Writing Problem-Solving Programs, Proc. IFIP
     ________ ___ _______ _______________ ________  _____ ____
     (1968), TA-2, 111-115

[18] Schuman, S., ed., "Proceedings of the International Symposium on
     Extensible Languages", ACM SIGPLAN Notices 6, 12 (Dec. 1971)
                            ___ _______ _______ _

[19] Smith, D. and Enea, H., "Backtracking in MLISP-2", (submitted to
     IJCAI)

[20] Smith, D. and Enea, H., "MLISP2 -- A Programming Language for
     Writing and Debugging Translators", (forthcoming)

[21] Smith, D., "MLISP (PDP-10)", Artificial Intelligence Memo No.
     135, Stanford University, Oct. 1970

[22] Sussman, G. J. and McDermott, D. V., "Why Conniving is Better
     than Planning", Proc. AFIPS FJCC 41 (1972), 1171-1180
                     _____ _____ ____ __

[23] Teitelman, W. et al, BBN-LISP Reference Manual, (July 1971),
                          ________ _________ ______
     Bolt, Beranek, and Newman

[24] Teitelman, W., "Toward a Programming Laboratory", Proc. IJCAI 1
                                                       _____ _____ _
     (1969), 1-8

[25] Teitelman, W., Design and Implementation of FLIP, a LISP Format
                    ______ ___ ______________ __ _____ _ ____ ______
     Directed List Processor, Scientific Report No. 10 (July 1967),
     ________ ____ _________
     Bolt, Beranek, and Newman

[26] Wegbreit, B., "The ECL Programming System", Proc. AFIPS FJCC 39
                                                 _____ _____ ____ __
     (1971), 253-262

[27] Weizenbaum, J., "ELIZA -- A computer Program for the Study of
     Natural Communication Between Man and Machine", Comm. ACM 9, 1
                                                     _____ ___ _
     (Jan. 1966), 36-45",